Constrained Maximum Mutual Information Dimensionality Reduction for Language Identification
نویسندگان
چکیده
In this paper we propose Constrained Maximum Mutual Information dimensionality reduction (CMMI), an informationtheoretic based dimensionality reduction technique. CMMI tries to maximize the mutual information between the class labels and the projected (lower dimensional) features, optimized via gradient ascent. Supervised and semi-supervised CMMI are introduced and compared with a state of the art dimensionality reduction technique (Minimum/Maximum Rényi’s Mutual Information using the Stochastic Information Gradient; MRMISIG) for a language identification (LID) task using CallFriend corpus, with favorable results. CMMI also deals with higher dimensional data more gracefully than MRMI-SIG, permitting application to datasets for which MRMI-SIG is computationally prohibitive.
منابع مشابه
Dimension reduction for speaker identification based on mutual information
Dimension reduction is a necessary step for speech feature extraction in a speaker identification system. Discrete Cosine Transform (DCT) or Principal Component Analysis (PCA) is widely used for dimension reduction. By choosing basis vectors from basis vector pool of DCT or PCA which contribute more to data distribution variance or reconstruction accuracy of speech data set, we can transform th...
متن کاملSecond Order Dimensionality Reduction Using Minimum and Maximum Mutual Information Models
Conventional methods used to characterize multidimensional neural feature selectivity, such as spike-triggered covariance (STC) or maximally informative dimensions (MID), are limited to Gaussian stimuli or are only able to identify a small number of features due to the curse of dimensionality. To overcome these issues, we propose two new dimensionality reduction methods that use minimum and max...
متن کاملSimple and Effective Dimensionality Reduction for Word Embeddings
Word embeddings have become the basic building blocks for several natural language processing and information retrieval tasks. Recently, there has been an emphasis on further improving the pre-trained word vectors through post-processing algorithms. One such area of improvement is the dimensionality reduction of word embeddings. Reducing the size of word embeddings through dimensionality reduct...
متن کاملDimensionality reduction based on non-parametric mutual information
In this paper we introduce a supervised linear dimensionality reduction algorithm which finds a projected input space that maximizes the mutual information between input and output values. The algorithm utilizes the recently introduced MeanNN estimator for differential entropy. We show that the estimator is an appropriate tool for the dimensionality reduction task. Next we provide a nonlinear r...
متن کاملMaximum conditional mutual information projection for speech recognition
Linear discriminant analysis (LDA) in its original modelfree formulation is best suited to classification problems with equal-covariance classes. Heteroscedastic discriminant analysis (HDA) removes this equal covariance constraint, and therefore is more suitable for automatic speech recognition (ASR) systems. However, maximizing HDA objective function does not correspond directly to minimizing ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012